A lot of things could be missing, wrong or stupid, please let me know if this is the case.
Tell me how you like the idea and if you have any suggestion. Also, start thinking about doing something yourself too. In fact, I think its even OK, to send a huge article (somebody else's) on a subject, bit by bit everyday (just copy and paste), as long as we all read it and discuss it, its worth the time and the effort (or so I think).
Lets hope all this works out well, and reaches its completion.
When we think of lambda calculus, we think of it as that thing the theory types use, when they discuss programming languages. But wait! Lambda calculus is much more fundamental than that! Lambda calculus was invented long before there were any computers (in the modern sense, as programmable machines), or computer languages. Alonzo Church invented lambda calculus as a machinery to formalize set theory.
You can't get any more fundamental than that! This is hardcore low level math! Set theory is what Russel and Whitehead used, in their attempt to express the entire field of mathematics as a logical theory (derive the entire mathematics from a set of axioms).
They derived the concept of numbers and arithmetic from this. Lambda calculus is considered a stroke of genious. A notation to intended to formalize this very set theory (it got into the same paradoxes that naive set theory fell into, but that is another story). Now we know what we are dealing with!
But then where does computer science fit in? The set of functions that are definable in lambda calculus are exactly those functions that are computable! In fact, lambda calculus and Turing Machines have the same expressive power. Also, they were invented (or conceptualized, or whatever), at just about the same time. Church and Turing even worked together after that in Princeton, before the World War II broke out and Alan had to go. Lambda calculus is the backbone of computer science and computability. There are people who say, that the various notions derived from the study of lambda calculus have been more useful than the Turing Machine has ever been (but well, the point is moot). But no one questions the value that lambda calculus has in computer science. But I must tell you that, lambda calculus and computer languages were first related only when Peter Landin userd it to give the semantics of Algol 60.
Here, are some points that I have pinched from a paper that give you an idea of how important it is.
Okay, I'll let you off with a last little bit of chronological history:
Around 1924 Schonfinkel developed a simple theory of functions. In 1934, Alonzo Church introduced lambda calculus and used it to develop a formal set theory, which turned out to be inconsistent. More successfully, he used it to formalize the syntax of Principia Mathematica (Whitehead and Russel). In the 1940s, haskell B. Curry introduced combinatory logic, a variable free theory of functions. More recently, Roger Hindley developed what is known as type inference. Robert Milner extended it to develop the polymorphic type system of ML, and published a proof that a well typed program cannot suffer a run-time type error (gotta investigate this, somebody take it up). Dana Scott developed models of the lambda calculus. With his "domain theory", he and Christopher strachey introduced denotational semantics.
Peter Landin used the lambda calculus to analyze Algol-60, and introduce ISWIM as a framework for future languages (this series might dedicate a substantial portion to ISWIM too). HIs SECD machine, with extensions, was used to implement ML andother strict functional languages (strict languages can be thought of as those, where arguments are evaluated before calling a function). Christopher Wadsworth developed graph reduction as a method for performing lazy evaluation with lambda expressions (this is the basis of implementations of lazy languages like Haskell). David turner applied graph reduction to combinators, which led to efficient implementations of lazy languages.
Next we'll be jumping right on to lambda calculus.
P.S.
Acknowledgements:
Lots of it from "Foundations of Functional Programming' by Lawrence
Paulson.
The history of set theory is rather different from the history of most other areas of mathematics. For most areas a long process can usually be traced in which ideas evolve until an ultimate flash of inspiration, often by a number of mathematicians almost simultaneously, produces a discovery of major importance. Set theory however is rather different. It is the creation of one person, Georg Cantor. Before we take up the main story of Cantor's development of the theory, we first examine some early contributions. The idea of infinity had been the subject of deep thought from the time of the Greeks. In 1847 Bolzano, considered sets with the following definition "an embodiment of the idea or concept which we conceive when we regard the arrangement of its parts as a matter of indifference." [God knows what this means! -S]
Bolzano defended the concept of an infinite set. At this time many believed that infinite sets could not exist. Bolzano gave examples to show that, unlike for finite sets, the elements of an infinite set could be put in 1-1 correspondence with elements of one of its proper subsets. This idea eventually came to be used in the definition of a finite set. It was with Cantor's work however that set theory came to be put on a proper mathematical basis. Cantor moved from number theory to papers on trigonometric series. These papers contain Cantor's first ideas on set theory and also important results on irrational numbers.
In 1874 Cantor published an article in Crelle's Journal which marks the birth of set theory.In his 1874 paper Cantor considers at least two different kinds of infinity. Before this orders of infinity did not exist but all infinite collections were considered 'the same size'. Cantor shows that the real numbers cannot be put into 1-1 correspondence with the natural numbers. [So there are different degrees of infinity, real numbers being the higher degree -S]. Cantor then introduces the idea of equivalence of sets and says two sets are equivalent or have the same power if they can be put in 1-1 correspondence.
[Lots of people criticized such "useless" mathematics. Infinities!
Transcendental numbers! All crap! - S], Cantor claimed that mathematics is
quite free and any concepts may be introduced subject only to the condition
that they are free of contradiction and defined in terms of previously
accepted concepts.
[Lot of interesting stuff cut off by me. -S]
The 'ultimate' paradox was found by Russell in 1902 [For more, see
Russell then asked :- Is A an element of A? Both the assumption that A is a member of A and A is not a member of A lead to a contradiction. The set construction itself appears to give a paradox. [Think a lot about this, and see to it that you do understand the paradox. Also, if Russell's paradox is in a system - "X is a member" and "X is not a member" are both true, in theory, then any statement whatsoever can be proved! And this is absurd! For more detail, visit the link above.- S]
When Russell told this to Frege, Frege said, A scientist can hardly meet with anything more undesirable than to have the foundation give way just as the work is finished. In this position I was put by a letter from Mr Bertrand Russell as the work was nearly through the press. [Mathematicians were really very worried about this paradox, and they started wondering which assumption (axiom) in their theory caused this. As they got more and more worried, parallely they were looking for axioms that give the power of set theory without running into paradoxes. These were the various attempts to axiomatize the set theory. Also, they were scared that some intuitive unknow assumption should not be causing all these problems. So they planned to have rigorous methods, and "formalize" the entire theory. So there are no pitfalls due to play of the human mind (*very* important), and also see whether different choices of axioms and notations lead to any progress. Also, once when you formalize a system, and play around with the rules, you find many extremely surprising unintuitive results, which you'll not find unless you forget all the "meanings" and just play around with the symbols. Hope you get an idea of how formalization helps, why it began. I'll stop here, otherwise the mail will get bigger and you guys won't read it at all! -S]
Before I go ahead, some stuff from me. Cantor's set theory, was hailed as a great work and very successful, and if this were true, and if entire mathematics could be derived from it, everything could be done by a machine! So mathematicians are no longer required! But then came Russell's Paradox which showed that Set Theory is inconsistent.
This is just the background story, so that you know what you are going to read next. BTW, then came Kurt Godel and shook all the three worlds, by saying, every logical theory must be either inconsistent or incomplete! Anybody interested, investigate that, and send in stuff. But we'll stick to Russell's paradox for the nonce.
DEFINITION:
Any symbol x is a lambda expression.
If M is a lambda expression, so is \x.M (This is called lambda abstraction.
Here, M is called the body).
If M and N are lambda expressions, so is M N.
That's it! So lets write a few lambda expressions:
x, \x.x, \x.\y.y, \x.z (\y.x), \x.\y.\z.x y z
Okay, we get the idea.
This style of definition is not new to us. We can call this a "structural" definition - basically, it just defines the structure that lambda expressions have. The definition is also recursive (see lambda abstraction above, for example). We have seen such a definition say, in definition of regular expressions.
Any symbol x is a regular expression.
If M is a regular expression, so is M*
If M and N are regular expressions, so is M N.
But this does not finish the definition. We aslo have
x is a regular expression for the language {x}
If M is a regular expression for language L, M* is for the kleene closure of
L.
If M and N are regular expressions for languages L and L', language-of(MN )
is concatenation of strings in L and L' (forgot how they write this!).
The three lines above, give the meaning for regular expressions. But this is missing for lambda calculus! There are no such lines that explain the meaning! And this is important. The concept of regular expressions depends on the concept of languages. So the "meaning" of regular expressions is given in terms of languages. But lambda calculus is intended to be so fundamental, that everything else is derived from that! So there can be no such lines giving such "meanings". For the lambda calculus, the definition is it! But then whats the use? Well, the meaning is reflected in the relations and "conversions" between lambda expressions. And these "conversion" rules, may be added as part of the definition. More about this next time. But it is important to understand why the definition of lambda calculus contains no lines that explain the meaning of each form (of the structure) of lambda expressions.
Let you tell me a famous story:
There was once a barber. Some say that he lived in Seville. Wherever he lived, all of the men in this town either shaved themselves or were shaved by the barber. And the barber only shaved the men who did not shave themselves.
That is a nice story. But it raises the question: Did the barber shave himself? Let's say that he did shave himself. But we see from the story that he shaved only the men in town who did not shave themselves. Therefore, he did not shave himself. But we again see in the story that every man in town either shaved himself or was shaved by the barber. So he did shave himself. We have a contradiction. What does that mean?
Maybe it means that the barber lived outside of town. That would be a loophole, except that the story says that he did live in the town, maybe in Seville. Maybe it means that the barber was a woman. Another loophole, except that the story calls the barber "he." So that doesn't work. Maybe there were men who neither shaved themselves nor were shaved by the barber. Nope, the story says, "All of the men in this town either shaved themselves or were shaved by the barber." Maybe there were men who shaved themselves AND were shaved by the barber. After all, "either ... or" is a little ambiguous. But the story goes on to say, "The barber only shaved the men who did not shave themselves." So that doesn't work either. Often, when the above story is told, one of these last two loopholes is left open. So I had to be careful, when I wrote down the story.
Now we come to a really serious attempt to solve the above puzzle: Maybe there was no barber like the one described in the story. But the story said, "There was once a barber..." So there really was a barber like that, unless the story is a lie! That is the answer, isn't it? The story is a lie. Sorry about that. I told the story of a barber who could not possibly exist. I had good motives. But I guess I told a lie.
In logic, some statements are true (Jim is nearsighted), some are false (Jim eats squash). And a collection of statements, such as our story, is either consistent or inconsistent. The following pair of statements is inconsistent:
The above story about the barber is the popular version of Russell's Paradox. The story was originally told by Bertrand Russell. And of course it has a simple solution. It is inconsistent. But the story is not really that simple. The story is a retelling of a problem in set theory.
In set theory, we have sets, collections of objects. These objects may be real physical objects (marbles) or not (cartoon characters, thoughts, or numbers). When we deal with a set, we normally write it down with brackets: {A, B, C}. That set contains three letters, A, B, and C. The set {B,C} is a subset of {A, B, C}. There is a special set with no elements, the empty set {} or ø, as the set of humans bigger than the earth, or the set of odd numbers divisible by two. Some sets contain infinitely many elements, as the set of all even numbers.
A set can contain sets. The set {{A, B, C}, {x, y}} contains two sets {A, B, C} and {x, y}. It also contains the empty set, by the way. All sets contain the empty set. We can define the set of all sets. This set contains {A, B, C} and {{A, B, C}, {x, y}} and every other possible set. Some sets contain themselves. The set of all red marbles does not contain itself, because it contains no sets at all, only marbles. Let's say that S is a set which contains S and {A, B}. Then this is S: {S, {A, B}}. It contains two sets, itself and {A, B}. The set of all sets obviously contains itself. Well, let's construct a very interesting set, the set of all sets which do not contain themselves.
There is something wrong here. Does "the set of all sets which do not contain themselves" sound like "the barber who shaves all men who do not shave themselves?" The story of the barber was inconsistent. The set of all sets which do not contain themselves is inconsistent for the same reason. Does the set of all sets which do not contain themselves actually contain itself, or not? If it contains itself, then it cannot contain itself. If it does not contain itself, then it must contain itself. It is inconsistent.
But where did we go wrong? Let's make some lists. A list is a special kind of set. But we know what a list is. A list may be clearer in our minds than a set. We cannot actually physically make infinite lists. But we can certainly define some of them, like the list of all even numbers. So we can deal with infinite lists. We can also make lists of lists. Here is such a list:
This list of lists is real. Now, if we allow infinite lists, then it is no stretch at all to produce the list of all lists, and even the list of all lists which do not contain themselves. And that list is inconsistent.
Well, maybe there are no infinite lists. There are infinite sets, for example, the set of all even numbers. And that is a list: the list of all even numbers. The concept of an infinite list is actually fairly simple.
So we have an inconsistent set. That is not all. We made no mistakes when we constructed the set of all sets which do not contain themselves. And that means that set theory is inconsistent. And that means that logic is inconsistent. And that means that all of mathematics, including algebra and geometry, is inconsistent.
It doesn't invalidate mathematics or logic or set theory. The Pythagorean theorem is still true. But there is some doubt. Kurt Godel (Gödel) proved that Number Theory (and by identical arguments, every branch of mathematics) is inconsistent. He converted Russell's Paradox, the set version, into a statement in Number Theory, and showed that Number Theory is inconsistent. This had huge repercussions in the world of mathematics. All of this leads to the following problem:
Thats why people run away from inconsistencies. To solve this problem various approaches have been taken. The most successful (and useful) being type theory. There are various type systems but we talk about the general concept of type theory. So what do we do? We define type rules, and only those expressions that confirm with the type rules are valid in the system. Now take care that we make a type system in which "a set of all sets that or not members of themselves" is not type correct. So this sentence cannot be derived at all! Does that decrease the expressive power? Well, yes. But thats ok, because most of the useful things that make sense and cause no big problems, still are type correct. Interested people can look for more on this.
Right now, this is all, we return back to our lambda calculus..
Now how, to capture this in lambda calculus? I order to do this, lambda calculus terms have have "conversions", where one form of a lambda expression can be converted to another. There are three such things - alpha conversion, beta conversion, ita conversion. Lets get to these conversions later. Just bear in mind that there is such a thing, and we are going to define what exactly each conversion is, ad this will help in imparting our intuitive meaning to lambda calculus.
Now in maths, if f(x) = x + x, aand we want f(2), what do we have? 2 + 2.
That is, we put 2 wherever there is x. Simple enough! This is called
"substitution" (as if we didn't know!). Here's the notation: M [N/x] means
"substitute N for x, in M". Some examples to get used to the notation:
(a b c) [c/p] -> abp
(\x.m n) [n/ y ] -> \x.m y
Well, you get the idea. This substitution is a key operation in lambda
calculus. But there are a few problems:
Consider (\x.y). Intuitively, this is a constant function, that returns y irrespective of x.
Now, (\x.y)[y/x] creates (\x.x), which is the identity function, which very much depends on x! Yuk, we don't want this! This'll make substitution very ugly and complicated, and we'll have to be very careful everytime we substitute! Substitution should not change the very meaning! One constant function to another constant function using substitution - ok. But this is too much! BTW, this phenomenon is called "variable capture" and we have to avoid it!
Now that you have an idea of the subtleties of substitution, later we'll define it so that there'll be no such problems. On the way, we'll analyze why the problem occurs, and that'll lead us to the notion of bound variables and free variable.
The second one is not a problem. In fact, this is called alpha conversion, but lets get to that later.
Problem is in cases like (\x.y)[y/x] -> (\x.x)
Look, we want to use substitution as a primitive operation for lambda calculus, we don't want to be so powerful (and ugly), that by some inadvertance, we can change the very nature of an expression (in the above example, from an expression that models a constant function, to one that models an identity function). So we plan to define substitution so that such problems do not occur.
So our aim now, is to define substitution while at the same time, eliminating the variable capture problem. For this, the notion of free variables and bound variables is very useful. So we go right ahead and define free and bound variables, so as to be well prepared, and tackle substitution..
The notion of free and bound variables is not unique to languages or lambda calculus. They are found all over mathematics. For example, take the summation notation (sigma), where say, i = 1 to i = n, sigma n^2, here clearly, "i" is a bound variable. Well, the notion is simple enough (but useful and important nevertheless!), so lets go straight ahead and define them for lambda calculus.
Bound Variables: Let BV(M) be the set of bound varaibles of a lambda
expression M. BV is defined as (based on the form of M),
BV(x) = {} /* no bound variables */
BV(\x.N) = {x} union BV(N) /* all the bound varables in the
"body" N, and x is also bound by the lambda */
BV(M N) = BV(M) union BV(N) /* all the bound variables that occur in M
and N */
Free Variables: Let FV(M) be the set of bound varaibles of a lambda
expression M. BV is defined as (based on the form of M),
FV = {x} /* x is free! */
FV(\x.M) = FV(M) - {x} /* all free variables in M but not x!
(its bound by the lambda */
FV(M N) = F(M) union FV(N) /* all the free ables that occur in M and N
*/
Thats it! Simple!
There's one thing to note though. See how these definitions are based on the structure or form of the lambda expression. In the definition of lambda expressions, there are three forms of lambda expressions. These definitions of free and bound variables give one line for each of the three forms. Such things are called "structural induction" or some such thing in math. Actually its easy. Say for example we want the number of (abstractions (i.e. lambdas) + number of applications) in an expression. How would you write it (or program it)?
It'll be a case analysis on structure of lambda expressions. Lets write it a function foo that does this:
switch(form of expression L)
{
case x:
return 0;
case (\x.M):
return 1 + foo(M);
case (M N):
return 1 + foo(M)+ foo(N);
}
Simple! You want just the number of lambdas in an expression? Thats easy too, and will go along the same lines (exercise for the interested reader?)!
Now we have all the ammunition to tackle substitution - lets do that next.
As we have already seen, \x.M models a function that has x as a parameter and body M. So x is "bound" in \x.M. That is, if you had say, (N (\x.M)), and both M and N contain x, the "x" in M is different from the "x" in N, (since x has a binding "\x" which extends into M). Hope that clears it - I could give a proper explanation with function applications, but I would be going far to ahead into beta reduction.
Consider (\x.x y (\y. x y)).
Now, if you consider the inner subexpression "(\y.x y)", y is bound and x is
free. But in the entire subexpression x is bound. Also, y is free in its
left most occurance, but bound in that inner subexpressions. Get it? That
is, the first y is "different" from the second.
Hope this clarifies the notions of free and bound variables in lambda calculus. We also have seen the exact definition. So we may move on to a precise, problem-free definition of substitution.
In other words, is x bound in (\x.x y) or in the whole of (\x.x y (\y.x y))? If it is bound in the whole subexpression, isn't y bound in it too?
Now, the first y is totally free. There ain't no shadow of no \y (lambda y). But the second occurence of the y is bound by "\y" that comes before it.
If we have M[N/y] (subsitute N for y in M), only free occurences of y in M are substituted. Its pretty understandable why this is so. (If in doubt, ask.)
Now the problem we were talking about was in cases like (\y.x)[y/x] And this makes the constant function (\y.x) into an identity function (\y.y), and we don't want this to happen. So next, we look at when is substitution safe.
Consider, M[N/x].
Now, what do we do when this happens? Simple. We can just go ahead and rename the bound variable - (\y.x y) [y/x] can now be made into (\z.y z) Look, renaming a bound varaible simply doesn't matter (actually this is called alpha conversion. We'll come back to this soon)!
Ok, lets wrap up by defining substitution
Definition: M [L/y], the result of substituting L for all free occurences of
y in M, is given by,
x [L/y] = L if x = y // simple enough, if its the variable you want to substitute, substitute it!
x [L/y]= x otherwise // if its some other variable, forget it!
(\x.M)[L/y] = (\x.M) if x == y // if the
variable you want to subsitute is bound in the entire expression, chuck it.
(\x.M)[L/y]= (\x.M[L/y]) otherwise // well,
there may be free occurence in M, so go ahead and substitute
(M N)[L/y] = (M [L/y] N [L/y]) // look for free
occurences in N and M, and do the needful.
(And then there's that, "rename the bound variable in M, if required" as
discussed above).
To elaborate the line
Substitution M[N/x] is safe as long as the bound variables of M are
disjoint from free variables of N.
M[N/y] (substitue N for each free occurance of y)
M N effect free(M) = free(N) then no problem.
free(M) = bound(N) then no problem (not much change in meaning, boundness in N will be local)
bound(M) = free(N) then problem (some free variables will now be bonded)
bound(M) = bound(N) then no problem ( boundness in N will be local)
doubt: (\x. x) [x/y] creates (\x. y) (substituting only after . (dot) ) is this also variable capture ?
The reply given then
"This is wrong. If we substitute, we substitute all occurances."
...and today...
If we have M[N/y] (subsitute N for y in M), only free occurences of y in M are substituted. Its pretty understandable why this is so. (If in doubt, ask.)
Contradiction?
About (\x.x)[x/y] creating (\x.y), I should have said, this is wrong, we do not substitute bound variables at all (and not "all occurences are substituted"), and thats why this is wrong. But we still can have, \x.(x[y/x]) This'll give (\x.y). But this is ok. Because substitution did not change the nature of the expression it acted upon. Substitution acted upon only x and changed it to y. The fact, that there was a \x outside, does not matter (the substitution cannot see it!). All we want, is that substitution must not change the nature of the expression it acts upon. Now, if one deliberately wants to do the substitution as given above (for whatever evil purpose that he has) thats ok. So all we want to say is that after substitution, tthe nature of a lambda expression (which underwent substitution) remains the same. Hope I have been clear.
Note that such an ugly instance of substitution will never come up as we unroll the definition of substitution over the different forms of lambda expressions. Such a thing will only come up when explicitly written like that (hope you understood the above lines! read it again!)
Now, about,
"If we have M[N/y] (subsitute N for y in M), only free occurences of y in M are substituted. Its pretty understandable why this is so. (If in doubt, ask.)"
I think, all of you have understood this statement. The only problem was that it seemed to contradict with the first one, which, I hope, I have cleared up.
If the shadow of doubt, lingers or as they say, looms, (which I think it does) , feel free to keep asking.
The trivial rule for equivalence is that two lambda expressions are equivalent if they are exactly identical. But that does help us make any progress, does it? So we invent a new concept of "equivalence". And say that we will consider two lambda expressions equivalent if one can be "converted" into another (or both can be converted into the a third same expression, more about this later).
Okay, so there are three conversions - alpha, beta and ita (that n-like symbol - remember the one for efficiency?).
Today, we'll keep it short andsimple and deal only with the first alpha. Actually we already know that, but lets give it a name and official recognition.
Alpha conversion rule is (\x.M) can be alpha converted to (\y.(M[y/x]). Which means, that the exact name of the bound variable does not matter. So (\x.x) and (\y.y) and (\z.z) are all "equivalent" functions.
Thats it. I stop here. This has been simple and will give you the breather you need so much.
Note that we have understood what "conversions" are, and that is good. We won't just learn three definitions and wonder what they are all about. In fact, if you can come out with a completely new conversion, and say lambda expressions are equivalent under this conversion, you'll have a completely new theory. But the essence here is, does the conversion capture some intuitive notion? Is it useful? Does accepting this extra rule of equivalence lead to more interesting results?
Beta conversion, captures our intuitive notion of functions and function application, that we have been sick of secretly harbouring in our minds. Out with it!
Definition - Beta Conversion: ((\x.M) N) on beta conversion leads to M [N/x]
This is function application! Let L be (\x.g x). So, (L a) is (g a).
This is like, Let L(x) = g(x). So, L(a) = g(a) - function application!
Now we know why (\x.x) is an identity function. ((\x.x) a) is a, ((\x.x) b) is b, ((\x.x)(\x.x)) is (\x.x). When I say "is", I assume that lambda expressions are equivalent after beta conversion. So whatever you give to an identity function, it returns itself. Also, the parameter can be any lambda expression! Since in lambda calculus, everything is a function (may be with 0 parameters), higher order functions and all that, is no big deal!
Now assume that you are using lambda calculus as a programming language, what would you want the interpreter to do? You'd want it to keep doing beta conversions until it can't do it anymore. So, hope I give you an idea of why beta conversion is the "real thing". But wait! This is all there is to the interpreter? Will it ever reach a state where no more beta reductions are possible? Well, now comes the scary part - non-termination!
Consider the lambda expression, our friend identity function's big brother -
B = (\x.x x)
So, B a = a a. B f = f f.
We can define such a B, since lambda calculus is untyped - no type restrictions! So B represents a function, that takes a parameter and applies the parameter to the parameter itself. Now, what happens if B is applied to itself? It'll apply itself to itself. But thats where we started!
(\x.x x) (\x. x x) on beta conversion gives (\x. x x) (\x.x x) - same thing!
So our interpreter will do the same thing again and again ad infinitum, and no progress.
Now, B can cause infinity in time, but there's more. Lets define a new one - Big B ;o)
Big B is (\x.x x x) - ok, you get the idea, but I'll write it anyways. Now apply Big B to himself. (\x.x x x)(\x.x x x) -> (\x.x x x) (\x.x x x) (\x.x x x) -> (\x.x x x)...9 times!
Boy, there goes space too!
That gentlemen, is beta conversion for you. People also call it beta reduction - but it ain't always "reduction", is it?
I'll go right ahead and define it.
Definition: ita conversion, (\x.M x) on ita conversion gives M provided M does not contain x as a free variable.
That means if M does not depend on x, (\x.M x) and M "mean" the same. That is both of these, when applied to N will give the same expression on beta reduction. (Easy to check).
This conversion embodies a principle called extensiality. That is, if two functions give the same expression when applied to the same argument, the two functions are equivalent. But as the paper I am referring to says, "In some situations, this principle (and ita-conversions) are dispensed with." Actually, I don't see why this is so.
My guess is that this prnciple is okay for simple lambda calculus. But once we allow primitive operations like numbers and arithmetic, it becomes difficult (and impossible, in the general case) to check whether two functions, given same argument give the same result.
Another issue maybe non-termination. If two functions lead to non-termination, or infinite expressions, should they be considered equivalent, or does the equivalence depend on the specific expression involved in non-termination. I am not sure whether, such conclusions regarding equivalence can be drawn, when infinities come into the picture. Perhaps that is why, the principle of extensialty is "dispensed with". I'll elaborate bit on this. We know for example, that
(\x.x x x) (\x.x x x) /* our friend Big B */
leads to infinities and non-termination. Now the, above expression is a concise and finite way of writing an infinite expression, which we will obtain by successive beta conversions. But given an arbitrary infinite expression, do we know how to write a concise and finite description like we do for BigB, or in fact, do we know that such a finite description even exists? Actually, I do not know the answer to this, but I think this gives an idea of why people chose to dispense with the principle of extensiality - its too much trouble.
This is the last conversion. Next time, we move on, armed with all the machinery there is, to play with lambda calculus, and look at the insights that we can obtain, by tinkering around with lambda expressions using these tools.
If no more reductions can be applied to a lambda expression, its said to be in "normal form". For examlpe, (\x.x) is in normal form, because no reductions are applicable. An expression is said to "have" a normal form, if after applying reductions, it can be brought to a normal form (that is you reach a point where you cannot apply any more reductions). So, ((\x.x) y) has a normal form, which is y. But we have seen that our friend B ((\x.x x) (\x.x x)) has no normal form, nor does BigB. BTW, this B is denoted by omega -the same symbol for ohms.
So if you want a two parameter function f(x, y) = L, you can write it as, \x.(\y.L)
L will have x and y as free variables. Now if the entire expression (\x.(\y.L)) is applied to a parameter, say N, we get another function that needs one parameter. When that is applied too, you get f(x, y). Currying is very popular in the functional programming world, because of the convenience it offers. But its not just that, the concept that a multiparameter function on supplying some of the parameters, gives a function that takes the rest of the parameters, is also a very useful way of thinking, and useful in designing programs.
I wish I could think of some nice and simple examples that will show you how cool currying is, but I can't think of any thing suitable. Anyways, I'll refer you to a paper. It could have been better, but its worth a read nevertheless. Its called "Fields in Physics are like Curried Functions Or Physics for Functional Programmers" by Gary T. Leavens. I take the liberty of attaching it along.(You can come to me for a printout).
Okay thats about currying. Btw, its just named after Haskell Curry - he didn't think of it first.
Before I go, I'll just give the bracketing conventions.
A two parameter function (\x.(\y.M)) can be written as (\x y. M)
In general, (\x1.(\x2....(\xn.M)...)) maybe written as \x1 x2 ...xn.M
Also, \x.\y.M N means \x.(\y.M N) and not \x.(\y.M) N
Therefore, \x.\y.M N does not admit any beta reductions, but \x.(\y.M) N does. That is, y can be substituted by N wherever appropriate.
Hope the bracketing conventions are clear.
To summarise, \x.M N abbreviates \x.(M N) rather than (\x.M) N. Also, x y z abbreviate (x y) z rather than x (y z).
Before, we take leave, here's how an application of a curried function will look like:
(\x y. L) M N on beta gives (\y.L [M/x]) N which in turn on beta gives L [M/x] [N/y] - just as desired.
In fact, such summaries are very useful, to get a overall view of what has happened so far, sort of a panoramic view of the whole lambda territory that we have so far been through.
Also, I must add, that this is a great stage in our studies of lambda calculus, to summarize - we have finished what can be called the "definition" of lambda calculus, and we will be moving on to its various insightful applications. So this just fits in perfectly.
Thanks a lot for the contribution.
Sachin's summary follows:
Lambda Calculus:
History
Alonzo Church (machinery to formalize set theory)
Computer Science and lambda calculus => functions in lambda are functions that are computable.
Turing
Russel's paradox
barber ?
inconsistent
Structural definition of lambda expressions: x, \x.y, ....
where's the meaning part ?,
meaning lies in the conversions and relations between expressions
modelling functions in mathematics
lambda abstraction: \x.M equivalent to f(x) = M
function application: M N denotes M=function and N=argument
substitution:
M[N/x] substitute N for x in M
variable capture: changes the form to something not acceptable
free and bound variables
For M[N/x] below are the states of variables common to both M and N
free(M) = free(N) then no problem
free(M) = bound(N) then no problem (not much change)
bound(M) = free(N) then problem (some free variables will now be bonded)
bound(M) = bound(N) then no problems (boundness in N will be local)
Conversions
alpha
\x.M can be converted to \y.(M[y/x]
converting say f(x) to f(y)
beta
function application (we seen the glimpse of that)
((\x.M)N) on beta conversion gives M[N/x]
helpful, go on doing beta conversions until real thing comes,
normalization
problem sometimes no-termination, omega
ita
(\x.M x) on ita conversion gives M provided x is not a free variable in M
extensibility
equivalent functions
what about non-terminable functions, always equivalent or if same upto
some level
Currying
bracketing conventions
x y z taken as (x y) z
(\x.(\y.M)) written as (\x y .M)
example of beta conversions step by step
(\x y.L) M N
(\x. (\y.L)) M N
(\y.L)[M/x] N
(L)[M/x][N/y]
L[M/x][N/y]
Lets start with how to encode boolean values in lambda calculus. We choose the booleans because there's just two of them. Lets tackle this first and then move to infinite data types like natural numbers. So how do we go about encoding booleans? Well, lets first see what we know about booleans. We know that there are two of those - true and false. And they have some meaning (which is nice and secure in our minds). Then there are things like if, and, or, not that use these values (so in a sense, they impart meaning to the tokens "true" and "false").
Ok, a little philosophical chatter follows. Our mind learns incrementally from experience. Then it extrapolates and develops new ideas. By experience and thinking, we have evolved the notion of true and false. These notions are not formalized in the mind. They are some things that work in most often occuring cases - "This is an apple" is true or false, depending on "this". Our minds also evolve the concept of "if", "not", "and", "or" and all these things. But these notions, need not be consistent. Our minds can live with that. So when Socrates says "whatever Plato will say next will be true", and Plato immediately says "Socrates just lied", we are baffled. Most of the ideas we have, are those that work in the most often occuring cases. When we push them to the limits of their domain, we are baffled, and think of new ways to define the idea. Formalization is a similar effort. We want to define things completely - they may not match with what our minds have learnt by experience, in some cases, but we want them to be consistent. In fact, one of the big uses formalization is that we can check how much of a concept we really understand. In formalization, nothing is implicit or assumed - everything must be put on paper. The system is, all that which is on paper, and nothing but that. But the unfortunate side effect is that we end up with a whole lot of things on paper, and it is a big mess! So we try to keep the system small and the axioms few, and try to encode everything in that. Since we had a hundred things in mind, but we put only a few things on paper, and try to build everything from that, things start to become unintuitive. But formalists still prefer to keep the number of primitive symbols and operations to a bare minimum - they can develop whatever intuiton is required!
The point of all this discussion was to tell you that adding symbols called "true" and "false" to lambda calculus, is not what we aim for. If thats the path we choose, we can add symbols for 1, 2, 3, and we have natural numbers; add symbols for integration, summation and all that and we have the entire mathematics! But no! We have frozen our definition of lambda calculus here! Nothing more is to be added!
So how to encode true and false? true will be some lambda expression and so will false. Both will be functions. "if" should work with them. So we are looking for something that satisfies:
if true M N = M
if false M N = N
Note that, "and", "or", "not" and other logical operations can be defined in terms of "if".
There definitely be a lot of encodings that'll satisfy the above laws, but the simplest and the most commonly used are:
true = \x y. x /* a function that takes two arguments and always returns the first */
false = \x y. y /* a function that takes two arguments and always returns the first */
if = \c x y. c x y /* just apply the condition (c) to x and y, and you have it! (if c is true, it'll return the first arg: x, similarly false will work too) */
Now, "and p q" is just "if p then q else false"
and = \p q. p q (\x y. y)
("or", " not" and all that is exercise for the interested reader).
Well, now we have an idea of how things can be encoded in lambda calculus. The aim is to encode entire mathematics! Well, we have made a beginning (not that we ever are going to reach the end!).
Another interesting thing in the way we have encoded booleans is that the data seems to carry its control along with it. So the branching of control in if-statements, is modelled in the definition of the booleans itself. Why these things happens is that in lambda calculus everything is a function and we don't intend to have special symbols with special meanings. So the only way to encode a concept, is to capture its behaviour. So booleans carry their control with them - that is what makes them boolean values!
After booleans, we proceed to see how to encode natural numbers. There are various encodings in use, but presented here is the encoding originally use by Church. It seems, other encodings are preferred today, one of you can take up that investigation (they pertain more to second order lambda calculus).
So, we want to encode 0, 1, 2, 3,....
This is how Church did it.
zero = \f x. x /* A function that takes two arguments and returns the
second */
Incidentally, this is nothing but our friend "false", ("of course!" say the C programmers).
one = \f x. f x /* A function that takes two arguments, and applies the first to the second. */
two = \f x. f (f x) /* Two applications */
In general the number n has n applications. (The second argument is used to
represent zero).
Its like the successor function used when defining natural numbers.
0 is a natural number.
if n is a natural number, successor(n) is a natural number.
So zero is 0, and 1 is successor(0), 2 is successor(successor(0)) and so on.
So the number of "f"s gives you the number being represented.
Get used to this encoding, we'll discuss how to write arithmetic operations, next.
0 is \f x.x /* apply f zero times */
1 is \f x. f x /* apply f once */
2 is \f x. f (f x) /* apply f twice */
3 is \f x. f (f (f x)) /* apply f thrice */
ok, you get the drift.
Now we have to work with these numbers. Lets start with addition. We have to
write a function add, such that,
add m n = m + n
So we want the answer to be an expression, that takes an f, and an x, and applies f to x, (m+n) times.
So, it'll looke like
add m n = \f x. M, where M will be an expression in terms of m, n, f and x.
or rather, add = \m n f x. M
I'll guide you to the answer now. But you might want to think a bit about it first (do it). Actually, I am not sure whether to try and take you through a thought chain that'll lead you to the answer, or give you the answer and leave it to you to figure how we came there. I guess I'll lead you through it, but keep a free and open mind, there may be other ways to get there - don't let my path to the answer narrow your thinking down, to one "method".
Now, we know one is \f x. f x, two is \f x. f (f x)). Lets put our friend successor, in place of f, to make it easier to read.
So,
one is \x. succ x
two is \x. succ(succ x)
And so on. To make it even easier, lets remove the \x, and put an underscore
in place of x, to show that its an argument.
So,
one is succ _
two is succ(succ _)
three is succ(succ(succ _))
Ok, think again? Are you getting the answer? Here's a hint - if you look at say, three, it "adds" three to the number which is in place of the underscore. So if you put 1 in that underscore, you get 3 + 1. If you put 2, you 3 + 2. If you put n, you get 3 + n. That means, once you have supplied the first argument (f) to a number (say m), you get a curried function (add m), that adds m to its argument. All you have to do, is supply the number to whom you want m to be added.
So,
if you have m, and supply the first argument f, just give second argument n, and you get m + n!
Now, have we got the answer? Well, I hope so! I'll write it down anyway,
add = \m n f x. (m f) (n f x) /* the final answer! */
As you see, (m f) is our curried function (add m), and it is applied to (n f
x), that is, n with both its arguments supplied as required.
With our bracketing conventions, we can write add as,
add = \m n f x. m f (n f x)
Now, lets check,
add = \m n f x.(m f) (n f x)
Given a "m" and a "n",
we get \f x. (m f) (n f x)
which expands to \f x. (m f) (ffff...n times x)
and then, \f x. (ffff...m times) (ffff...n times x)
which is \f x. (ffff...m+n times) x
which is nothing but (m + n)!
Well, hope I have given you a start, think about it a bit and you'll get the feel.
Well, next we have to tackle multiplication. It'll be something like, mul = \m n f x. M
Can you get the M? I'll wait a few days for some replies, before moving on.
Do try.
add = \m n succ x.(m succ) (n succ x)
For multiplication, we want the add operation itself to be applied m times, so we would want to substitute addn for succ.
mul = \m n succ x.(m addn) (n succ x)
mul = \m n succ x. (m addn) zero which is, addn to zero m times.
addn = \n f x. n f x
addn n = (\n f x. n f x) n = \f x. n f x = n
Actually, add n is a curried function of add, and in its body n should occur free.
So, addn = \p f x. (n f) (p f x) /* we are adding n to the parameter p */
mul= \m n f x. (m (\n f x. n f x))(n f x)
mul = \m n succ x.(m addn) zero
so mul = \m n.m (\p f x.(n f) (p f x)) (\f x.x)
Well, mul in this form seems a lot more complicated than what Sachin has sent. But they both are equivalent (I am not able to construct a proper proof now, will try harder when I am not so tired! But it doesn't seem so easy).
But the principle here, is important. "succ" in m's structure, is replaced by "addn", and you get multiplication by n! This is something like homomorphism. Keep the same structure, but change the trusses and the hinges, and you get something else! I was going to leave exponentiation m raised to n, as an exercise, but if you follow the same route, to get exponentiation, you put muln in place of succ! So, you exploit the fact that any natural number n, has n operations, and over base. For addition, you change the base to m. For multiplication, you change the operation (to addn). For exponentiation - you change the base (to one), and operation (to muln).
mul = \m n f x. (m (n f x))(n f x) ??
Or even:
mul = \m n f x. (m n) (n f x) ??
Not sure if this is correct, but it seems so somewhat intuitively, since it's effectively obtained from add, except that we substitute n for f in the left term. So instead of "increment", we have "add n times" which is what multiplication is.
Can anybody furnish a proof of how the two forms of mul are in fact equivalent? I'll try too.
Haven't checked any text as yet. I thought I should really do some *work* here instead of simply referring to a book :)
Let me know if I am close...
pair = \x y f. f x y
So pair two seven = \f.f two seven
where two and seven are encoded using our encoding for our natural
numbers.
Now, what are the operations on pairs:
For extracting the first element, we need a function that takes a pair, and returns the first element.
fst = \p. p true
where true is our old friend \x y. x
Its easy to see how this'll work. p is a pair. Look at the definition of pair. You see that p is something that takes a function (of two parameters) and applies it to the two elements of the pair. Now the function that we supply is true, which returns the first of its two arguments. So what we'll get is the first element of the pair.
Lets see how this works:
if we build that pair of two and seven, we have (\f. f two seven).
Now to extract the first of this:
fst (\f.two seven) = (\p. p true) (\f. two seven) /* by definition of fst */
= (\f. f two seven) true /* by beta reduction */
= true two seven /* by beta reduction */
= (\x y. x) two seven /* by definition of true */
= two /* by beta reduction */
Its easy to define the function to extract the second element of a pair.
snd = \p. p false
So we can now build pairs of data - note that its a compound data type. So we can have pairs of numbers, of booleans, one number on boolean, pair of pairs - anything. So we sure are making progress.
I'll leave you with an interesting exercise, thinking about which is useful, insightful and will help you undertand the material to come next, better. Write the factorial function - a function that'll take a natural number and return its factorial.
Here's what we know about the factorial function:
Ok, think think think. How is a natural number represented? It is a function that takes two things - a function (f), and value(x), such that the function can operate upon this value. We want to use this representation, to get the factorial.
But what about recursion? We know that fact(n) = if n = 0 then 1 else n * fact(n-1)
How do we encode this recursion? Hey, but but a natural number is represented as f(f(f(f(x)))) - that is a kind of recursion, isn't it? We have to use it somehow. If only we could decide upon a "f" and a "x"!
What do we want f to do? We want the nth f, to use the value returned by the (n-1)th f, to give the factorial of n. What will an f require? It'll require the factorial of (n-1). Yes, but is that enough? No, it also needs a way to keep track of n. Bingo! Let f return both these things in a pair - a pair whose first component is n, and the second component is factorial(n-1). Cool! So we've decided on the type of x too! "x" gotta be a pair too!
What should x, be? When the first f is applied to x, we want it to give a pair (2, factorial(1)) so that it can be used by the second f. Right, but what should the first f, get? It should get (1, factorial(0)), which is nothing but (1, 1). Great! So we have fixed x! Now we can focus completely on f.
f must take a pair (n, factorial(n-1)) and give a pair (n+1, factorial(n)), which is nothing but (n+1, n*(factorial(n-1))).
So, f is nothing but something that takes a pair p, and returns another -
lets write it.
f = \p. pair (succ (fst p)) (mul (fst p) (snd p)).
In case you are in doubt, "fst" and "snd" give the first and second elements of a pair. "mul" is multiplication, and "succ" gives the successor of a natural number.
Ok, lets write all these functions in lambda terms:
pair = \ f x y.f x y
fst = \p. p (\x y. x)
snd = \p. p (\x y. y)
mul = \m n.m (\p f x.(n f) (p f x)) (\f x.x)
succ = \n. (\f x. f (n f x))
So our
f = \p. pair (succ (fst p)) (mul (fst p) (snd p)) becomes,
f = \p. (\f x y. f x y) ((\n. (\f x. f (n f x))) ((\p. p (\x y. x)) p)) ((\m n.m (\p f x.(n f) (p f x)) (\f x.x)) ((\p. p (\x y. x)) p) ((\p. p (\x y.
y)) p)
Don't try too hard to read this, I have just subsituted for pair, succ, fst, snd, and mul.
Ok, we have got f, but we don't want a pair! We want just the factorial.
Therefore,
factorial = (\n. snd (n f x))
where f = the f we have found above,
x = (1, 1) which is pair (\f x.f x) (\f x.f x) which is (\f x y. x y) (\f x. f x) (\f x. f x)
Phew! Lets write it all!
factorial = \n. (\p. p (\x y. x)) (n (\p. (\f x y. f x y) ((\n. (\f x. f (n f x))) ((\p. p (\x y. x)) p)) ((\m n.m (\p f x.(n f) (p f x)) (\f x.x)) ((\p. p (\x y. x)) p) ((\p. p (\x y. y)) p)) ((\f x y. x y) (\f x. f x) (\f x. f x)))
Done!
Of course, if we perform beta-reductions, the expressions might get smaller! But we aren't machines, are we?! If anybody can make like an automaton and do it - great! Otherwise, as long as you've understood it all, the goal's achieved.
So I hope, everybody attacks this with renewed vigour, and there are a lot more discussions, in the familiar territory where functions are written as "f(x)", and not "\x.M". but I must also caution you that it isn't easy. Well, if a thing ain't easy in the start. But that is exactly where such mail series helps! We can discuss and harass each other, until we make sure that all of us understands!
Ho to Recursive Function Theory tomorrow! I must also tell you that most of the material will be copied from an article (no I won't say which one yet!). But you'll get it in bits and pieces. And we should move ahead only after satisfying ourselves, that we have understood. Also, if I get no mails for a day or two after a posting, I assume that everybody has understood. If anybody wants me to wait, feel free to let me know. And if anybody wants to go back to something that he thinks that the others have understood, and its too far behind, still feel free to go back, lets exploit this mail series method to its fullest!
Recursive Function Theory...
Recursive function theory, like the theory of Turing machines, is one way to make formal and precise the intuitive, informal, and imprecise notion of an effective method. It happens to identify the very same class of functions as those that are Turing computable. This fact is informal or inductive support for Church's Thesis, asserting that every function that is effectively computable in the intuitive sense is computable in these formal ways.
[Note that recursive functions here means a particular class of functions, and not "functions who call themselves" kind. - Srineet.]
Like the theory of Turing machines, recursive function theory is vitally concerned with the converse of Church's Thesis: to prove that all the functions they identify as computable are effectively computable, that is, to make clear that they use only effective methods.
["Effective" here, seems to mean that it can be done in finite time. Not too sure though. - Srineet.]
Recursive function theory begins with some very elementary functions that are intuitively effective. Then it provides a few methods for building more complicated functions from simpler functions. The building operations preserve computability in a way that is both demonstrable and (one hopes) intuitive. The initial functions and the building operations are very few in number, and very simple in character, so that they are entirely open to inspection. It is easy to trust the foundation, and by its nature if you trust the foundation you trust the superstructure it permits you to build.
Because recursive function theory was developed in part to capture the intuitive sense of effectiveness in a rigorous, formal theory, it is important to the theory that the class of recursive functions can be built up from intuitively effective simple functions by intuitively effective techniques. Ironically, in my view, no author that I have seen expounds recursive function theory for beginners so that it really looks as intuitive as it intends to be. My intention here is to make the theory intuitively clear without sacrificing rigor.
Logicians differ in their choice of the set of elementary functions that comprise the ultimate or primitive "ingredients" of all other functions. Picking them is comparable to picking axioms for a formal system that is to capture a predetermined body of mathematics. There is some slack, but not complete freedom.
If we continue the comparison to axiomatics, then we must revert to an ambition of axiomatics almost completely abandoned in the 20th century: the ambition to make the axioms "self-evident truths". The elementary functions must be intuitively effective or computable. If they are not, one will still get an interesting, perhaps an identical, set of resulting functions; but one will have abandoned part of the programmatic motive of the theory. (It would be much like developing a "modified" Turing machine that did the same work as an ordinary Turing machine but whose method of computation was so complex that it was mysterious.)
Our elementary functions are total, not partial. The reason should be clear. Only total functions fit the intuitive sense of effective method. A partial function is not defined for some arguments, and hence computes no result for those values. It is part of the definition of an effective method that it will always produce the correct answer.
Our elementary functions are all functions of the natural numbers. Hence, they may take zero as input, but not a negative number, and not any rational or irrational fraction.
[ Here, you can see that, this theory, in a way, includes the natural number theory. But when I say "natural number theory" it is just that wee little subset, where "zero is a natural, and if n is a natural number, so is succ(n)." The various arithmetic operations are not defined - so no division by zero and trouble! But in lambda calculus, this definition of natural numbers, is just a nencoding that we have made up, because we think it satisfies the notion of natural numbers, that we want it to satisfy. - Srineet.]
On most lists of "ingredient" functions are the zero function, the successor function, and the identity function.
z(x) = 0
s(x) = successor of x (roughly, "x + 1")
id(x) = x
The zero function returns zero regardless of its argument. It is hard to imagine a more intuitively computable function.
The successor function returns the successor of its argument. Since successorship is a more primitive notion than addition, the "x + 1" that I've used to elucidate the function is simply for the convenience of readers. The addition function does not yet exist and must be built up later.
If successorship is obscure or non-intuitive in the absence of addition, remember that we have allowed ourselves to use the system of natural numbers. We can't calculate yet, but we can count, and that is all we need here. (Hence, the core of recursive function theory that must be intuitive includes not only simple functions and their building operations, but also the natural numbers.)
The zero and successor functions take only one argument each. But the identity function is designed to take any number of arguments. When it takes one argument (as above) it returns its argument as its value. Again, its effectiveness is obvious. When it takes more than one argument, it returns one of them.
id2,1(x,y) = x
id2,2(x,y) = y
The two subscripts to the identity function indicate, first, the number of arguments it takes, and second, which argument will be returned by the function. (We could write the identity function more generally so that it applied to any number of arguments, but to preserve simplicity we will not.)
While others could be added, these three elementary functions suffice (with the techniques to be introduced) to build up all the functions we believe to be computable in the intuitive sense. Given the simplicity of these "ingredient" functions, this is a remarkable simplification of computation.
[So the are the basic functions, on which we plan to base the theory of recursive functions. - Srineet.]
We will build more complex and interesting functions from the initial set by using only three methods. If we use the analogy to axiomatics again, the methods for building higher functions from our initial set correspond to rules of inference applied to axioms.
The three methods are composition, primitive recursion, and minimization.
Composition:
If we start with the successor function,
s(x)
then we may replace its argument, x, with a function. If we replace the
argument, x, with the zero function,
<>?
z(x)
then the result is the successor of zero.
s(z(x)) = 1
s(s(z(x))) = 2 and so on.
In this way, compounding some of the initial functions can describe the natural numbers.
This building operation is called "composition" (sometimes "substitution"). It rests on the simple fact that computable functions have values, which are numbers. Hence, where we would write a number (numeral) we can instead write any function with its argument(s) that computes that number. In particular, we may use a function with its argument(s) in the place of an argument.
If we think of a function and its arguments as an alternate name of its value, then composition simply allows us to use this kind of name as the argument to a second function.
In the language used in predicate logic, a function with its arguments is a term, and may be used wherever terms are used in predicate logic well formed formulas. The arguments of functions are also terms. Hence, a function can have other functions as arguments.
We can define the function that adds 2 to a given number thus: add2(x) = s(s(x)). Introducing new names to abbreviate the compound functions created by composition is a great convenience since complex functions are made more intuitive by chunking their simpler components. But the essence of composition is not the re-naming or abbreviating of compound functions, but the compounding itself.
Obviously, the computability or effectiveness of composition is not affected by the number of arguments in any of the component functions, provided it is finite.
If a function is a rule (for picking a member of the range when given a member of the domain), then it must be computed to reach its result. In that sense we may consider functions to be programs or software, and their arguments input or data. From this perspective, the act of composition replaces data with a program that, when run, gives as output the data one desired as input. Composition shows the interchangeability, under controlled circumstances, of software and data.
It should be clear that when composition is applied to computable functions, only computable functions will result. Even if this is obvious, it may be well to aid intuition with reason. Composition yields computable functions when applied to computable functions because it does nothing more than ask us to compute more than one function and to do so in a certain order (working from the inside of the compound expression to the outside). If the ingredient functions are computable, then the task of computing any finite number of them in sequence will be computable. If the number of nested functions is finite (if the length of wffs is finite), then the effectiveness of the overall computation is not diminished.
The second building operation is called primitive recursion. To those not used to it, it can be far from intuitive; to those who have practiced with it, it is fundamental and elegant.
Function h is defined through functions f and g by primitive recursion when
h(x,0) = f(x)
h(x,s(y)) = g(x,h(x,y))
Let's unpack this slowly. First, remember that f and g are known computable functions. Primitive recursion is a method of defining a new function, h, through old functions, f and g. Second, notice that there are two equations. When h's second argument is zero, the first equation applies; when it is not zero, we use the second. Notice how the successor function enforces the condition that the argument be greater than zero in the second equation. Hence, the first equation applies in the "minimal case" and the second applies in every other case. (It is not unusual for a function to be defined by a series of equations, rather than just one, in order to show its value for relevantly different inputs or arguments.)
So when (1) the second argument of h is zero, the function is equivalent to some known function f and we compute it; otherwise (2) it is equivalent to some known function g and we compute it. That's how we know that function h, or more generally, the functions built by primitive recursion, will be computable.
It is the second case that is interesting. We already know how to compute function g, but look at its two arguments in this case. Instead of taking two numbers, function g is here a compound function and takes one number x and another function. Its second argument is expressed by function h again! This circle is the essence of primitive recursion. To calculate function h when neither argument is zero we must calculate function g, which requires us to calculate function h.
This would be impossible if the function took the same arguments in both instances. It would be like saying to someone who could not multiply 10 by 20, "Nothing could be simpler; one only has to multiply 10 by 20!" Fortunately, that is not the case here. Compare the arguments for function h that we see on the left and right sides of the second equation. To calculate h when its arguments are (x,s(y)) we must calculate h when its arguments are (x,y). So the function is calling itself with different arguments. Note the second argument in particular. It has decremented from the sucessor of y (= y+1) to y itself. This means that to calculate h, we must decrement one of its arguments by one and calculate that. So the function does not futilely call itself; it calls a variation of itself, which makes all the difference.
(We could write the equations for primitive recursion more generally to show how a function of any number of terms calls a variation of itself, but again for the sake of simplicity we will not.)
But how do we do calculate the decremented variation of the original? Look at the equations. To calculate h when its second argument is n, we calculate h when its second argument is n-1; and to do that we calculate h when its second argument is n-2; and so on; that is the procedure required by the two equations. But eventually by this means the second argument will equal zero, in which case we use the first equation, and calculate function f, and are done.
Let's look at an example. To calculate 5! (5 "factorial"), we multiply 5x4x3x2x1. How would we define the general factorial function recursively? To say that it is n(n-1)(n-2)...(n-(n-1)) would be correct but not recursive. In that formulation, the function never calls itself and it contains the mysterious ellipsis (three dots). Recursive functions not only have the peculiarity of calling themselves, but they eliminate the need to use the ellipsis. This is a considerable improvement in clarity, rigor, completeness of specification, and intuitive effectiveness, even if it is still "weird" from another standpoint.
So, the recursive definition of the factorial function would be constructed like this. Initially, we note that 1! = 1. Now, to compute n!, we do not multiply n by (n-1), for that would commit us to the non-recursive series we saw above. Instead we multiply n by (n-1)!. And that's it; there is no further series. But to calculate (n-1)! we refer to our equations; if we have not reached 1 yet, then we multiply n-1 by (n-2)!, and so on until we do reach 1, whose factorial value is already defined. So the general formula would be expressed by these two equations:
1! = 1
n! = n(n-1)!
If we had already bothered to build up the functions for multiplication and subtraction from our initial functions and the building operations, then we could show more particularly how primitive recursion works as a building operation. The factorial function is derived by primitive recursion from the functions for multiplication and subtraction. A stricter definition of the factorial function, f(n), then, consists of these two equations:
f(n)
f(n) = 1 when n = 1
f(n) = n(f(n-1)) when n > 1
The first equation defines the condition under which the self-calling circular process "bottoms out". Without it we would have an infinite loop which never returns an answer.
This kind of self-calling would clearly work just as well if the "bottom" condition were greater than n, and we incremented instead of decremented until we reached it.
Primitive recursion is like mathematical induction. The first equation defines the basis, and the second defines the induction step.
Note that not all "recursive functions" use the building operation called "recursion". (Worse, not all "primitive recursive functions" use "primitive recursion".) They are called "recursive functions" because they are recursively defined, just as a formal system typically defines its wffs recursively, enabling it to deal with an infinity of instances through an effective method for deciding whether arbitrary candidates are cases. The use of the terms "recursive" and "primitive recursive" for components for the overall theory, and again for the the overall theory, is confusing and regrettable but that's the way the terminology has evolved.
We saw the function
h(x, 0) = f(x)
h(x, s(y)) = g(x, h(x, y))
See h(x, 2), it can be drawn as an expanding tree:
h
x   2
on expansion
  g
x     h
  x       1
on further expansion
    g
  x     g
      x       h
          x         0
finally...
  g
x     g
  x       f
          x
f and g are known functions, and h - the unknown in the definition, disappears! In fact, all primitive recursive functions on natural numbers will have exactly the same structures. Only the fs and the gs will change, and the type of x will be the type expected by f and g. We can call primitive recursion as "structural recursion" because the overall structure or the framework is fixed - zero is the base case and then we can build over it. In fact, the very "data structure" here, which is natural numbers, is defined that way. We can exploit this structure, and work on any kind of object (the type of x has no restriction), we use the second parameter to obtain the desire "structure" in the recursion.
What other kind of recursion can there be? Well the most general equation can be something like
f x = if p then c else f (g x)
where p is some arbitrary predicate (a random oscillator!), and hence nothing can be said about such functions.
Now we know everybody loves primitive recursion, we know it'll terminate, and it also is so useful!
We must also understand that this structural recursion is not fixed to this particular structure (or to natural numbers- which is the same thing!).
Consider binary trees instead of natural numbers.
A tree is
h(x, leaf) = f(x)
h(x, node leftChild rightChild) = g(x, b(h(x, leftChild), h(x, rightSubChild))
There are three known functions here :
Minimization:
Let us take a function f(x). Suppose there is at least one value of x which makes f(x) = 0. Now suppose that we wanted to find the least value of x which made f(x) = 0. There is an obviously effective method for doing so. We know that x is a natural number. So we set x = 0, and then compute f(x); if we get 0 we stop, having found the least x which makes f(x) = 0; but if not, we try the next natural number 1. We try 0,1,2,3... until we reach the first value which makes f(x) = 0. When we know that there is such a value, then we know that this method will terminate in a finite time with the correct answer. If we say that g(x) is a function that computes the least x such that f(x) = 0, then we know that g is computable. We will say that g is produced from f by minimization.
The only catch is that we don't always know that there is any x which makes f(x) = 0. Hence the project of testing 0,1,2,3... may never terminate. If we run the test anyway, that is called unbounded minimization. Obviously, when g is produced by unbounded minimization, it is not effectively computable.
If we don't know whether there is any x which makes f(x) = 0, then there is still a way to make function g effectively computable. We can start testing 0,1,2,3... but set an upper bound to our search. We search until we hit the upper bound and then stop. If we find a value before stopping which makes f(x) = 0, then g returns that value; if not, then g returns 0. This is called bounded minimization.
(We could define functions f and g more generally so that each took any number of arguments, but again for simplicity we will not.)
Of course, as a building operation we must build only with computable functions. So we build g by minimization only if f(x) is already known to be computable.
While unbounded minimization has the disadvantages of a partial function which may never terminate, bounded minimization has the disadvantage of sometimes failing to minimize.
If you know any programming language, we can make a helpful comparison. Recall the distinction between loops that iterate a definite number of times (e.g. Pascal FOR loops) and those that iterate indefinitely until some special condition is met (e.g. Pascal REPEAT and WHILE loops). Let loops of these two kinds be given the task of testing natural numbers in search of the least value of x in the function f(x) = 0. For each given candidate number, n, starting with 0 and moving upwards, the loop checks to see whether f(n) = 0. The loops which iterate a definite number of times will test only a definite, finite number of candidates. To run such a loop is to take so many steps toward the computation of function g(x). This is bounded minimization at work. Indefinitely iterating loops represent unbounded minimization: they will keep checking numbers until they find one which makes the value of function f equal zero, or until the computer dies of old age or we turn it off.
In general we will have no way of knowing in advance whether either kind of loop will ever discover what it is looking for. But we will always know in advance that a bounded search will come to an end anyway, while an unbounded search may or may not. That is the decisive difference for effective computability.
Suppose an unbounded search does terminate with a good value for y. Since the program did halt, it must have halted after a definite, finite number of steps. That means that we could have done the same work with a bounded search, if only we had known how large to make the upper bound. (In fact, we could have done the whole search without loops at all, if only we had known how many times to repeat the incrementing statements.) This fact is more important for theory than for practice. For theory it means that whatever unbounded minimization can actually compute can be computed by bounded minimization; and since the latter is effective, the ineffectiveness of the former is significantly mitigated. (We can increase the bound in bounded minimization to any finite number, in effect covering the ground effectively that unbounded minimization covers ineffectively.) For practical computation this means little, however. Since we rarely know where to set the upper bound, we will use unbounded searches even though that risks non-termination (infinite loops). Our job is not made any easier by knowing that if our unbounded search should ever terminate, then a bounded search could have done the same work without the risk of non-termination.
Some terms will help us ring a sub-total. If we allow ourselves to use composition, primitive recursion, and bounded minimization as our building operations, then the functions we can build are called primitive recursive (even if we did not use the building operation of primitive recursion). If we add unbounded minimization to our toolkit, we can build a larger class of functions that are called general recursive or just plain recursive. More precisely, functions are general recursive only when they use unbounded minimization that happens to terminate. Functions using unbounded minimization that does not terminate are called partial recursive because they are partial functions.
Church's Thesis refers to general recursive functions, not to primitive recursive functions. There are effectively computable functions that are not primitive recursive, but that are general recursive (e.g. Ackermann's function). But the open challenge latent in Church's Thesis has elicited no case of an effectively computable function that is not general recursive (if general recursive subsumes and includes the primitive recursive).
Partial recursive functions
General recursive functions
Primitive recursive functions
The set of general recursive functions is demonstrably the same as the set of Turing computable functions. Church's Thesis asserts indemonstrably that it is the same as the set of intuitively computable functions. In Douglas Hofstadter's terms, the partial recursive functions are the BlooP computable functions, the general recursive functions are the terminating FlooP computable functions. In Pascal terms, the primitive recursive functions are the FOR loop computable functions, while the general recursive functions the terminating WHILE and REPEAT loop computable functions. The partial recursive functions that are not also general or primitive recursive are the WHILE and REPEAT (FlooP) functions that loop forever without terminating.
Recursive Funtion Theory identifies class of functions that are Turing Computable(similar to Lamda Calculus) Class of recursive functions built from simple intuitively effective(finite time for to reach to a solution) functions using simple techniques.
There are three Elementary functions to build up all functions that are computable and effective:
-argument replaced by function
-In some cases composition shows interchangability of software and data.
-a function is defined using primary recursion as follows
h(x,0) = f(x)
h(x,s(y)) = g(x,h(x,y))
where f and g a known computabble functions.
- in the 2nd equation, the succ(y) enforces that the second argument is > zero.
In minimal case first equation is used and in all other cases the 2nd equation is used.
Its notion is similar to that of mathematical induction.
- function can also be defined by a series of such equations
For eq: Factorial function can be written as
f(n) = 1 when n = 1
f(n) = n(f(n-1)) when n > 1
- Structural Recursion used for other structures too apart from Natural Numbers only.
eq Binary trees
g(x) which computes for least x such that f(x) = 0 is said be obtained from f by minimzation.
Unbounded Minimization:
If we donot know that there is a x which makes f(x) = 0 and still runn the test then this minmization is called as unbounded minimization. g thus obtained will not be effectively computable.
We can make g in such cases effectively computable by keeping an upper bound on the value of x. This is called bounded minimization.
Disadvantages:
unbounded minimization : partial function may never terminate.
bounded minimization : sometimes failing to minimize
Analogy:
a *while* loop would be unbounded minimization and a *for* loop would be bounded minimization
unbounded search may lead to termination after a finite steps. Hence it could have been calculated using bounded minimization.For theory it means the ineffectiveness of the unbounded is significantly mitigated.for practical computation not much of any help.